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Abstract 


A  system  was  developed  to  investigate  the  data  rate 
necessary  to  transmit  speech  using  a  rule  based  sinusoidal 
model.  The  system  consists  of  a  speech  analyzer  and  a 
synthesizer.  The  analyzer  outputs  discrete  frequencies  and 
quantized  amplitudes  and  phases  of  selected  speech  spectral 
components.  The  synthesizer  reconstructs  speech  from  these 
components  based  on  a  sinusoidal  model.  The  selection  of 
spectral  components  for  voiced  speech  regions  is  based  on 
the  detection  of  harmonics  of  the  fundamental  frequency.  To 
obtain  a  specific  number  of  spectral  components,  a  variable 
amplitude  threshold  is  applied  to  the  detected  harmonics  and 
their  nearest  neighbors.  For  unvoiced  regions  only  the 
variable  amplitude  step  is  applied.  The  lowest  data  rate 
obtained  for  toll  quality  speech  was  about  18  Kbps.  This 
system  was  implemented  in  Fortran  77  on  a  VAX  11/780 
computer.  Visual  analysis  of  speech  was  provided  by  the 


software  package  SPIRE  (Speech  and  Phonetics  Interactive 

'  j  f  i .  / —  /  .p  \ 
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RULE  BASED  SINUSOIDAL 


ENCODING  OF  SPEECH 


I.  Introduction 


The  importance  of  digital  speech  transmission  systems 
has  increased  over  the  last  few  years  because  they  offer  two 
considerable  advantages  over  analog  systems.  The  first  is 
the  replacement  of  major  portions  of  analog  circuitry  by 
digital  integrated  circuits,  which  increases  reliability  and 
stability  (13:384).  The  second  is  a  very  low  error  rate.  In 
long  distance  communications,  several  repeater  stations  are 
used  to  compensate  for  attenuations  in  the  line;  analog 
repeaters  have  cumulative  errors,  because  they  don't 
compensate  exactly  for  line  attenuations  that  vary  with 
frequency.  On  the  other  hand,  digital  regenerators  are  immune 
to  cumulative  noise  (14:78). 

These  and  some  other  advantages  related  to  digital 
design  flexibility  are  somewhat  offset  by  the  bigger  channel 
bandwidth  needed  to  transmit  digital  speech.  Transmission 
lines,  microwave  links,  etc,  are  band  limited  by  nature, 
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making  bandwidth  a  very  important  economical  factor  and  the 
research  for  smaller  bandwidth  digital  speech  channel 
worthwhile.  Theoretically,  smaller  bandwidths  can  be  obtained 
by  using  an  efficient  code  to  reduce  speech  redundancy. 

Background 

Currently  there  are  several  systems  suitable  for 
digital  speech  transmission.  These  systems  can  be  roughly 
divided  into  two  categories  depending  on  the  encoding  scheme 
used.  When  they  generate  an  approximation  to  the  input,  based 
on  minimizing  the  distance  between  signal  and  approximation, 
they  are  usually  called  waveform  encoders  (11:649).  On  the 
other  hand,  systems  that  reconstruct  the  speech  signal  based 
on  the  magnitudes  of  the  signal  short-term  spectrum  are 
called  analysis/synthesis  coders  (11:649). 

Pulse  Code  Modulation  (PCM)  is  an  example  of  a  waveform 
encoder  that  is  in  use  world-wide  for  speech  transmission. 
The  PCM  codec  (coder-decoder)  samples  the  speech  signal  at 
8000  samples  per  second,  and  produces  a  7  or  8  bit  number  per 
sample,  which  yields  a  data  rate  of  56  or  64  Kbps.  To  reduce 
this  number,  some  systems  where  proposed  that  use  past  data 
to  predict  the  next  sample  value.  These  schemes  are  called 
n-tap  Differential  Pulse  Code  Modulators  (DPCM) ,  and  use  an 
n-tap  Linear  Prediction  Coding  (LPC)  filter,  where  n 
represents  the  number  of  past  values  used  to  make  the 
prediction. 

The  data  rate  reduction  is  accomplished  by  transmitting 
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the  difference  between  the  prediction  and  the  actual  sample 
value.  The  difference  signal  has  reduced  variance  when 
compared  to  the  variance  of  the  original  signal,  which 
justifies  the  smaller  data  rate  obtained  (11:625).  The  bit 
rate  reduction,  for  a  fixed  output  signal-to-noise  ratio 
(SNR) ,  is  usually  measured  by  the  prediction  gain,  defined 
as  the  ratio  of  input  to  output  variances;  DPCM  coders  have 
prediction  gains  in  the  order  of  6  to  8  dB  (11:633). 

Some  other  techniques  are  able  to  improve  this  gain  by 
adding  data  dependent  adjustments  to  the  coefficients  of  the 
LPC  filters,  for  example  the  International  Telegraph  and 
Telephone  Consultative  Committee  (CCITT)  has  set  a  32  Kbps 
Adaptative  Differential  Pulse  Modulator  (ADPCM)  as  a  standard 
for  toll-quality  speech  (11:639).  It  is  worthwhile  noting 
that  the  CCITT  specifies  a  64  Kbps  PCM  channel  for  speech  in 
its  Integrated  Services  Digital  Network  (ISDN)  standard, 
although  a  32  Kbps  is  already  possible,  and  16  Kbps  will  be 
enough  by  the  time  ISDN  will  be  completely  implemented 
(14:108)  . 

In  the  category  of  analysis/synthesis  coders,  the  most 
representative  examples  are  the  vocoder  (voice  coder)  and  the 
Linear  Predictive  Coder  (LPC) .  The  vocoder  models  the  speech 
mechanism.  A  typical  vocoder  consists  of  several  narrow  band 
filters  that  estimate  the  power  spectrum  of  the  speech 
signal,  and  of  an  excitation  estimator  to  decide  if  the 
signal  is  periodic  -  voiced  speech,  or  turbulent  -  unvoiced 
speech.  The  spectral  amplitudes  and  the  lower  band  of 
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frequencies  that  contain  pitch  information  are  transmitted 
along  with  the  voiced  /  unvoiced  decision.  The  spectral 
amplitudes  affect  the  gain  of  the  filter  banks  in  the 
receiver  (decoder) ,  which  are  driven  by  a  noise  generator  in 
case  of  unvoiced  sounds  or  by  the  lower  band  of  voice 
frequencies  (using  a  non-linear  spectral  expansion  process) . 
The  total  required  to  transmit  vocoder  speech  can  be  reduced 
to  700  Hz  or  less  (11:652).  This  implies  11.2  Kbps  if  the 
transmitted  information  is  digitized  with  8  bits  per  sample. 
The  well  known  KY-585  vocoder  transmitted  intelligible,  but 
low  quality  speech  at  2400  bps. 

The  Linear  Prediction  Coder  (LPC)  makes  use  of  a  scheme 
similar  to  ADPCM,  but  instead  of  transmitting  the  difference 
between  the  predicted  and  actual  value,  it  transmits  the 
filter  coefficients  that  are  used  to  make  the  prediction,  and 
the  voiced/unvoiced  excitation.  The  rational  behind  is  that 
if  an  optimal  algorithm  is  used  to  predict  the  values  then 
the  difference  signal  is  essentially  zero,  and  need  not  be 
transmitted. 

In  the  area  of  speech  enhancement,  recent  work  at  AFIT 
produced  an  analysis/synthesis  coder  with  a  very  high  quality 
of  output  speech  (5) .  Essentially,  this  system  reconstructed 
speech  from  selected  amplitude  and  phase  spectral  components 
of  time  slices  of  the  input  speech.  The  number  of  required 
components  (54  to  85)  to  obtain  high  quality  reconstructed 
speech  generated  the  motivation  for  the  following 
investigation  presented  in  this  thesis. 
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Problem 


The  purpose  of  this  thesis  is  to  implement  a  rule  based 
system  for  speech  analysis  and  synthesis,  using  processing 
techniques  developed  around  a  sinusoidal  speech  model  and  to 
investigate  and  minimize  the  data  rate  necessary  for  digital 
speech  transmission.  The  system  should  be  flexible  enough  to 
allow  quick  changes  on  the  rules  for  selecting  frequency 
components  that  characterize  speech  and  should  also  permit 
the  variation  of  the  number  of  bits  needed  to  quantize  those 
components . 

Scope 

The  speech  is  reconstructed  from  the  frequency, 
quantized  amplitude  and  phase  of  selected  spectral 
components.  The  speech  spectrum,  on  which  the  ruled  base 
selection  takes  place,  was  obtained  by  taking  a  512-point 
Discrete  Fourier  Transform  (DFT)  of  speech  time  slices.  To 
obtain  a  fixed  frame  length  for  transmission,  a  post¬ 
selection  was  done  by  a  variable  threshold. 

Approach 

The  overall  system  can  be  divided  into  two  subsystems: 
the  speech  analyzer  and  the  speech  synthesizer.  The  approach 
for  the  speech  analyzer  is  outlined  as  follows.  First, 
Hamming  windows  with  50%  overlap  are  applied  to  the  speech 
time  frames;  this  doubles,  by  induced  redudancy,  the  total 
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amount  of  data  processed.  Then  a  512-point  DFT  is  used  to 
perform  the  conversion  to  the  frequency  domain.  In  the 
frequency  domain  the  components  that  characterize  the  speech 
are  selected  based  on  a  heuristic  rule.  Finally,  the 
amplitudes  and  phases  of  the  selected  components  are 
quantized  and  packed  into  a  vector  form,  and  a  frame  with 
frequency  information  about  these  components  is  generated. 

The  synthesizer  portion  of  the  system  is  outlined  as 
follows.  First  the  reduced  frames  containing  the  quantized 
amplitudes  and  phases  are  expanded  to  their  original  size, 
by  using  the  frequency  vector  information.  Then,  a  time 
domain  conversion  algorithm  is  used,  and  the  resultant  wave 
shapes  are  averaged  to  obtain  the  original  number  of  frames. 
Finally,  the  time  waveform  is  amplitude  normalized  to  drive 
a  digital-to-analog  converter. 

Sequence  of  Presentation 

Chapter  Two  presents  the  hardware  and  software 
environment  used  to  process  and  analyze  speech  waveforms. 

Chapter  Three  describes  the  entire  system.  The  modules 
are  functionally  analyzed  and  details  for  the  algorithms  used 
are  given. 

Chapter  Four  presents  the  results.  Input  and  output 
time  waveforms,  and  respective  spectrograms  are  showed,  and 
a  qualitative  evaluation  is  given  for  the  different  data 
rates  obtained  by  varying  the  number  of  selected  components, 
or  the  number  of  quantization  bits  used  for  the  amplitudes 


1-6 


and  phases. 

Chapter  Five  provides  conclusions  and  recommendations 
for  upgrading  and  using  this  system,  and  the  Appendices 
provide  source  code  listings,  data  flow  diagrams  and 
additional  results. 
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II .  Acoustic  Processing  Environment 

The  purpose  of  this  chapter  is  to  describe  the 
software  and  hardware  tools  used  to  implement  and  test  the 
system.  The  first  section  describes  the  speech  digitizing 
system.  Then  the  programing  development  environment  is 
described.  The  last  section,  discusses  SPIRE  a  speech 
analysis  tool  that  was  used  to  visualize  speech  files. 

Speech  Digitizing  System  (1) 

The  Digital  Sound  Corporation,  DSC-200,  was  the 
analog-digital  converter  used  to  digitize  and  playback  the 
input  and  output  speech  files,  respectively.  This  digitizer 
has  the  capability  of  sampling  audio  signals  at  a  maximum 
rate  of  5  KHz,  and  produces  a  speech  file  constituted  by 
consecutive  256-point  arrays  with  integer  values  from  - 
32,768  to  32,767.  The  sampling  rate  used  was  8  KHz,  yielding 
a  speech  frame  length  of  32  msec.  The  resultant  digitized 
speech  was  stored  in  a  VAX  11/780  system  for  further 
processing. 
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Software  Development 


The  software  was  written  in  Fortran  77  on  a  VAX 
11/780  system,  under  the  VMS  operating  system.  Besides,  the 
VAX  11/780,  the  program  was  sometimes  executed  on  a  MicroVAX 
III  machine.  Structured  programing  techniques  were  used  to 
written  the  source  code.  Data  flow  diagrams  and  source  code 
are  presented  on  Appendices  C  and  D,  respectively. 

SPIRE  (12) 

Speech  and  Phonetics  Interactive  Research  Environment 
(SPIRE)  is  a  software  package  running  on  a  3600  LISP  machine. 
SPIRE  allows  the  user  to  interactively  examine  and  process 
speech  by  generating  several  bit-mapped  displays  with 
resolutions  of  1280  by  760  pixels  or  1216  by  773  pixels. 
Among  the  various  displays  that  SPIRE  generates,  time 
waveforms  and  narrow-band  spectrograms  were  the  most 
frequently  used.  They  helped  in  system  development  and 
provide  an  illustration  of  the  results  obtained.  All  the 
displays  are  synchronized  and  the  narrow-band  spectrogram 
uses  a  bandwidth  of  78  Hz. 
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Ill .  Speech  Processing  System 


The  processing  system  consists  of  a  speech  analyzer 
and  a  speech  synthesizer  whose  main  functions  are  described 
in  the  following  subsections.  Meanwhile,  the  calculation  of 
the  data  rate  as  a  function  of  the  overall  system  is 
presented. 

The  data  rate  is  determined  by  the  flux  of  information 
from  the  analyzer  to  the  synthesizer.  This  information  is 
contained  in  three  frames,  i.e.  arrays,  with  length  set  a 
priori  as  an  input  parameter.  One  frame  contains  the  spectral 
frequencies,  and  the  other  two  the  spectral  amplitudes  and 
phases.  The  number  of  bits  used  per  frame  slot  is  fixed  for 
the  frequency  frame  (8  bits) ,  and  is  determined  by  the  number 
of  quantization  levels  for  the  other  two  frames.  We  must, 
also,  consider,  that  these  frames  need  to  be  transmitted  in 
half  the  time  of  the  input  speech  frame,  because  overlapping 
the  windows  in  the  analyzer  has  doubled  the  amount  of  data 
to  be  processed.  These  considerations  yield  the  following 
expression  to  calculate  the  data  rate. 

Data  Rate  =  2L(A+P+8)/T  (3.1) 


where 

L  =  frame  length  (#  of  slots  in  a  frame) 

A  =  number  of  amplitude  quantization  bits 
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B  =  number  of  phase  quantization  bits 
T  =  duration  of  speech  time  frame  (32  ms) 

Speech  Analyzer 

A  simplified  block  diagram  is  showed  on  Figure  3.1. 
A  more  detailed  representation  is  provided  in  the  form  of 
data  flow  diagrams  in  Appendix  C.  The  analyzer  receives  its 
input  from  a  digitized  speech  file  produced  by  the  DSC-200 
with  the  sampling  rate  set  to  8  KHz.  This  input  is  Hamming 
windowed  into  256  points  frames,  each  representing  32  ms  of 
speech.  Except  for  the  first  frame,  each  time  frame  is 
completely  processed  before  another  frame  is  inputed.  The 
exception  results  from  the  need  to  generate  overlapping  data 
for  processing  and  has  two  main  consequences:  the  first  is 
that  the  total  amount  of  speech  frames  is  approximately 
doubled;  the  second  is  that  one  frame,  or  32  ms,  of  intrinsic 
delay  is  generated  between  input  and  output. 

After  being  windowed  a  speech  frame  is  converted  to 
the  frequency  domain  by  a  DFT  algorithm.  The  resulting 
spectral  amplitudes,  after  being  processed  by  a  selection 
routine,  are  packed  into  a  reduced  length  frame  and 
quantized.  The  corresponding  phases  are  also  quantized  and 
a  frame  containing  frequency  information  is  generated. 
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Windowing.  To  reduce  the  frequency  ringing  effects 
near  discontinuity  points  (Gibbs  phenomenon)  associated  with 
rectangular  windows  (3:59),  a  Hamming  window  is  used  to 
sample  the  input  signal.  This  window  offers  a  good  compromise 
between  dynamic  range  and  transition  time  which  allows  good 
resolution  of  closely  spaced  frequency  tones  having  great 
differences  in  amplitude.  The  Hamming  window  is  defined  by 
the  following  equation: 

WHam(i)  =  °*54  ■  0 . 46COS  [2wi/ (L-l)  ]  (3.2) 

where 
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L  =  frame  length  (256) 


i  =  0,  1,  2  ...  (L-l) 


To  avoid  loss  of  data  of  the  input  time  series,  due 
to  small  values  of  the  Hamming  window  near  the  boundaries, 
a  overlapping  scheme  of  50%  is  used  before  the  window  is 
applied  (3:56).  This  overlapping,  illustrated  in  Figure  3.2, 
increases  the  number  of  input  frames  (n)  to  2n-l. 


Figure  3.2.  Hamming  Windows  with  50%  Overlap 


Discrete  Fourier  Transform.  To  convert  the  speech  to 
the  frequency  domain,  a  512-point  Fast  Fourier  Transform 
( FFT)  (6:457)  was  taken  of  the  256-point  time  series  packed 


with  256  zeros.  The  real  and  imaginary  components  out  of  the 
FFT  were  used  to  calculate  the  spectral  amplitudes  and 
phases.  The  result  is  two  256-point  amplitude  and  phase 
frames,  with  a  resolution  of  15.625  Hz.  Figure  3.3 
illustrates  this  process. 


256  simple* 

512-point 

256  freq  amplitude* 

256  freq.  phase* 

DFT 

Figure  3.3.  Discrete  Fourier  Transform 


Harmonics/Peaks  selection.  To  reduce  the  number  of 
spectral  components  that  characterize  speech,  a  rule  based 
selection  was  carried  out  on  the  amplitude  frame  generated 
by  the  DFT.  First,  the  energy  of  the  frame  was  computed  to 
determine  if  the  frame  contained  voiced  or  unvoiced  speech 
(5:6).  If  the  energy  was  above  a  certain  threshold  level, 
then  the  frame  was  considered  to  contain  voiced  speech.  For 
the  speech  files  that  were  processed,  this  level  was 
empirically  set  to  106.  Based  on  this  classification, 
different  selection  rules  were  applied. 

The  selection  rules  for  voiced  speech  make  use  of  the 
fact  that  voiced  sounds  are  periodic  (9:4).  This  implies  that 
the  frequency  spectrum  is  composed  of  harmonics  of  the 
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fundamental  frequency  of  the  time  waveform,  also  known  as  the 
glottal  pitch  frequency,  and  allow  the  potential  of  speech 
reconstruction  from  these  harmonics  (7:27.6.2)  .  A  subproduct 
of  selecting  harmonics  is  the  elimination  of  non-harmonic 
noise. 

Because,  synthesized  speech  from  the  exact  harmonics 
appears  to  have  a  musical  noise  (2:3-6),  all  the  rules 
developed  were  based  on  the  selection  of  the  highest 
amplitude  frequency,  in  a  frquency  region  containing  the 
harmonic  and  a  few  of  its  neighboring  frequencies.  The 
different  rules  tried  differ  by  the  way  the  glottal  frequency 
was  determined,  the  way  it  was  used  to  search  for  harmonics 
and  by  the  number  of  neighbors  used.  One  of  the  rules  tried 
was  used  by  Bashir  in  his  master's  thesis:  the  number  of 
neighbors  of  the  harmonic  was  set  to  two,  the  glottal 
frequency  was  fixed  at  125  Hz  (value  characteristic  of  male 
speech)  and  the  next  harmonic  was  determined  by  adding  the 
glottal  frequency  to  the  frequency  of  the  highest  amplitude 
neighbor.  This  allowed  local  and  global  adjustments  to  the 
harmonic  frequency. 

In  some  other  rules  developed  and  tried,  the  glottal 
frequency  was  not  fixed,  but  was  set  from  frame  to  frame  to 
the  maximum  amplitude  in  a  region  were  the  glottal  frequency 
was  expected  to  lie,  generally  from  125  to  187.5Hz  (4).  Also 
tried  was  harmonic  selection  in  the  neighborhood  of  multiples 
of  the  glottal  frequency,  thus  only  allowing  local  variations 
in  the  harmonic  position,  and  not  global  as  in  Bashir's 
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thesis . 


Subjective  auditory  tests,  determined  the  rule  that 
was  used  to  produce  the  results  presented  in  Chapter  4.  The 
number  of  neighbors  was  set  to  four,  the  glottal  pitch 
frequency  was  set  to  maximum  amplitude  frequency  in  the 
region  12  5  to  187.5  Hz,  and  this  value  was  used  to  step 
through  the  spectrum  in  search  of  the  harmonics,  thus 
allowing  local  and  global  variations  to  the  harmonics 
position. 

For  the  unvoiced  regions  no  selection  rule  was 
devel loped,  even  though  a  separate  software  module  was 
designed  to  allow  a  rule  selection  in  the  future  use  of  the 
system.  The  selection  of  peaks  in  the  unvoiced  frame,  is  only 
performed  by  the  variable  threshold  module,  which  selects 
the  peaks  with  amplitudes  above  the  threshold. 

Variable  Threshold.  This  module  sets  up  a  variable 
threshold  to  perform  post-selection  on  the  harmonic  or  peak 
frame  generated  by  the  above  process.  Its  implementation 
makes  use  of  the  fact  that  the  energy  in  speech  falls  off  at 
a  rate  of  6  db  per  octave  after  about  625  Hz  (10:651);  this 
threshold  was  approximated  by  the  following  equation: 

Threshold  =  a/[l  +  (i/42)z]1/z  (3.3) 

The  number  of  frequency  amplitudes  above  the  threshold 
depends  on  the  value  used  for  a.  A  successive  approximation 
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algorithm  increases  or  decreases  the  value  a  untill  the 
number  of  components  above  the  threshold  equals  the  desired 
value  L.  All  the  components  below  the  threshold  are  set  to 
zero,  therefore  the  resulting  number  of  amplitude  frequencies 
in  a  frame,  after  selection  and  post-selection,  equals  the 
input  parameter  L. 


Figure  3.4.  Variable  Threshold 


Amplitude,  Frequency  and  Phase  Frame  Generation.  The 

selected  amplitude  components,  originally  in  a  256-point 
array,  are  encapsulated  in  an  array  of  length  L.  The 
frequency  of  each  amplitude  component  is  written  into  another 
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complementary  array  with  the  same  length.  This  frequency 
frame  indexes  the  phase  frame  generated  by  the  DFT  and  the 
phases  corresponding  to  the  selected  amplitudes  are  also 
written  to  a  reduced  frame  of  length  L.  The  two  reduced 
frames  with  amplitudes  and  phases  are  passed  to  the 
respective  quantizer  modules  and  the  frequency  frame  is 
outputed  to  the  synthesizer. 

Amplitude  Quantizer.  The  quantizer  implemented  is 
uniform  with  midtread  at  the  origin;  it  uses  quantile 
intervals  determined  by  the  number  of  bits  needed  to  encode 
the  total  number  of  intervals  and  by  the  maximum  amplitude 
level  it  can  handle  without  saturating.  Both  the  number  of 
bits  (A  in  equation  3-1)  and  the  dynamic  range  are  inputs  to 
the  system.  The  maximum  amplitude  level  was  set  to  6xl05. 
This  value  was  experimentally  determined,  for  the  speech 
files  used,  as  a  compromise  between  the  best  use  of  the 
dynamic  range  and  less  operating  time  in  the  saturation 
region. 
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J 


Vmax 


X 


Figure  3.5.  Amplitude  Quantizer 


Phase  Quantizer.  The  phase  quantizer  is,  also,  a 
linear  quantizer,  with  the  levels  uniformly  distributed 
between  n  and  -n ,  and  with  midtread  at  the  origin.  The  number 
of  levels  is  determined  by  the  input  parameter  P  (see 
equation  3.1). 
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Figure  3.6.  Phase  Quantizer 


Speech  synthesizer 

This  subsystem  reconstructs  speech  from  the  three 
frames  containing  frequencies,  and  quantized  amplitudes  and 
phases  of  length  L  generated  by  the  speech  analyzer.  The 
reconstruction  is  based  on  the  fact  that  speech  can  be 
represented  as  a  sum  of  sinusoidal  waveforms  (8:489).  After 
conversion  to  the  time  domain,  a  data  reduction  algorithm  is 
used  to  restore  the  original  number  of  frames.  Recall  that 
the  initial  windowing  process  has  expanded  the  number  of 
speech  frames  from  n  to  2n-l.  The  following  discussion 
analyses  the  modules  that  constitute  the  synthesizer.  These 
modules  are  depicted  in  Figure  3.5. 
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Figure  3.7.  Speech  synthesizer 

Frame  expansion.  This  module  expands  the  reduced 
amplitude  and  phase  frames  to  the  original  size.  By  making 
use  of  the  frequency  frame,  the  amplitudes  and  phases  are 
written  in  the  proper  locations  of  two  256-point  arrays. 

Time  Domain  Conversion.  The  expanded  amplitude  and 
phase  frame  are  used  to  perform  the  time  domain  conversion, 
according  to  the  following  expression: 

s(t)  =  T,-  a,cos (2?rf jt+0,.)  (3.4) 
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where 


a(  =  value  of  ith  location  of  amplitude  frame 
fj  =  frequency  correspondent  to  the  ith  location 
=  value  of  the  ith  location  of  the  phase  frame 

Data  Reduction.  This  module  makes  use  of  local  memory, 
to  reduce  the  number  of  speech  frames,  from  the  value 
produced  by  50%  overlapping  windows,  to  the  original  value 
n.  A  Hamming  window  is  applied  to  all  input  frames,  then  if 
the  frame  being  processed  is  odd,  its  first  128  locations  are 
changed  to  the  value  of  their  sum  with  the  last  128  locations 
of  the  preceding  (and  already  Hamming  windowed)  even  frame. 
The  resultant  frame  is  stored  in  local  memory,  waiting  for 
the  next  incoming  even  frame.  When  this  frame  arrives,  the 
last  128  positions  of  the  stored  frame,  are  changed  to  the 
value  of  its  sum  with  the  first  128  positions  of  the  even 
incoming  frame.  Then  the  stored  frame  is  outputed.  The 
following  figure  illustrates  this  process. 


Figure  3.8.  Data  reduction 
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Amplitude  Normalization.  Here  the  incoming  speech 
frames  from  the  Data  Reduction  module,  are  written  to  a  file 
untill  the  system  as  finished  processing  its  input.  Then, 
this  file  is  amplitude  normalized  to  drive  the  DSC-200  and 
produce  listenable  speech.  The  normalization  process  is 
essentially  the  same  as  used  by  Bashir  in  his  mater's  thesis. 
The  speech  amplitudes  are  multiplied  by  32767,  and  divided 
by  an  estimated  maximum  amplitude  value.  This  carries  out  the 
conversion  to  integer*2  data  type  needed  to  drive  the 
digital/analog  converter,  and  makes  the  output  volume 
independent  of  the  input  level  (2:3-14). 
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IV.  Results 


The  search  for  the  lowest  data  rate  possible  with 
this  system  began  with  the  determination  of  the  best 
algorithm  to  look  for  harmonics  in  the  voiced  regions.  Then 
the  effects  on  the  reconstructed  waveform  of  changing  either 
the  number  of  quantization  levels,  or  the  frame  length  were 
observed.  The  results  obtained  by  this  process  were  mainly 
determined  by  subjective  listening  tests  even  though  the 
visualization  of  narrow  band  spectograms  and  time  waveforms 
of  reconstructed  and  original  speech  had  also  played  a 
preliminary  role  in  that  determination.  The  system  was  tested 
for  different  input  speech  files,  but  the  results  presented 
were  all  obtained  with  the  input  file  SND .  The  following 
sections  analyze  each  step  of  the  above  process. 

Harmonic  Search  Algorithm 

The  different  algorithms  tested  were  based  on 
variations  of  the  scanning  mechanism  to  find  harmonics,  and 
on  the  number  of  selected  components  in  the  vicinity  of  the 
harmonic.  Two  different  searching  methods  were  used.  The 
first,  is  based  on  an  estimation  of  the  pitch  frequency  (150 
Hz) ,  and  uses  this  value  to  jump  to  the  location  of  the  next 
harmonic  in  the  amplitude  frame.  The  other  evaluates  the 
pitch  frequency  (also,  the  jump  value)  in  a  neighborhood  of 
150  Hz  thus  allowing  variations  on  the  pitch  frequency  from 
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frame  to  frame.  In  this  case,  the  pitch  value  corresponds  to 
the  frequency  of  the  highest  amplitude  in  the  neighborhood. 
The  neighborhood  was  set  to  2 ,  4  or  6  neighbors  of  the 
harmonic,  in  both  cases.  Figures  4-1  to  4-6,  present  the 
results  obtained  by  the  two  mechanisms,  using  constant  values 
for  frame  length  (64)  ,  number  of  amplitude  bits  (12)  ,  and 
number  of  phase  bits  (12).  Figures  4-1  and  4-2,  show  the 
original  (SND)  and  reconstructed  waveforms  using  fixed  pitch 
(2NFP,  4NFP,  6NFP) .  Figure  4-3,  shows  the  corresponding 
narrow-band  spectograms.  The  reconstructed  files  obtained 
with  variable  pitch  (2NVP,  4NVP,  6NVP)  are  shown  in  Figures 
4-4  to  4-6.  The  best  reconstruction  of  speech  was  considered 
to  be  done  by  the  fixed  pitch  algorithm  with  4  neighbors. 
This  configuration  was  then  used  in  the  determination  of  the 
number  of  quantization  levels  and  the  frame  length,  that 
yield  the  lowest  possible  data  rate  for  quality  speech 
transmission. 
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0.0000  SND  Original  Waveform  2.0000 


0.0000  2NFP  Original  Waveform  2.0000 


0.0000  4NFP  Original  Waveform  2.0000 


0.0000  6NFP  Original  Waveform  2.0000 


Figure  4.1.  Time  waveforms.  Original  waveform 
SND,  and  reconstructed  waveforms  using  fixed 
pitch.  The  number  of  neighbors  was  set  to  two 
(2NFP) ,  four  (4NFP) ,  and  six  (6NFP) . 
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0.7862  SND  Original  Waveform  0.8362 


0.7862  2NFP  Original  Waveform  0.8362 


0.7862  4NFP  Original  Waveform  0.8362 


0.7862  6NFP  Original  Waveform  0.8362 


Figure  4.2.  Detailed  Time  Waveforms.  Same  as 
last  figure,  but  with  increased  detail. 
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0.0000 


Narrow-Band  Spectrogram 


1.5196 


0.0000 


6NFP 


Narrow-Band  Spectrogram 


1.5196 


Figure  4.3.  Narrow-band  Spectrograms . These 
spectrograms  correspond  to  the  waveforms  shown 
in  Figure  4.1. 
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0.0000  SND  Original  Waveform  2.0000 


0.0000  2NVP  Original  Waveform  2.0000 


0.0000  4NVP  Original  Waveform  2.0000 


0.0000  6NVP  Original  Waveform  2.0000 


Figure  4.4.  Time  Waveforms.  Original  waveform 
SND,  and  reconstructed  waveforms  using 
variable  pitch.  The  number  of  neighbors  was 
set  to  two  (2NVP) ,  four  (4NVP)  ,  and  six 
( 6NVP)  . 
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0.7862  SND  Original  Waveform  0.8362 


0.7862  2NVP  Original  Waveform  0.8362 


0.7862  4NVP  Original  Waveform  0.8362 


0.7862  6NVP  Original  Waveform  0.8362 


Figure  4.5.  Detailed  Tine  Waveforms.  Same  as 
last  figure  but  with  increased  detail. 
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0.0000  SND  Narrow-Band  Spectrogram  1.5196 


0.0000  6NVP  Narrow-Band  Spectrogram  1.5196 


Figure  4.6.  Narrow-band  Spectrograms . These 
spectrograms  correspond  to  the  waveforms  shown 
in  Figure  4.4. 
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uantization  Levels  and  Frame  Length 

The  effects  of  the  quantization  levels  and  the  frame 
length  on  the  data  rate  is  expressed  in  equation  3.1,  through 
the  number  of  amplitude  bits  (A) ,  phase  bits  (P) ,  and  frame 
length  (L)  .  The  minimum  data  rate,  for  which  the 
reconstructed  waveform  produced  acceptable  listening  tests, 
was  obtained  with  six  amplitude  bits,  two  phase  bits,  and 
eighteen  length  frame,  yielding  a  data  rate  of  18  Kbps.  This 
waveform  (4NFP18L6A2P)  is  shown  in  comparison  with  the 
original  waveform  (SND)  in  Figures  4.7  and  4.8.  Figures  4.9 
to  4.20,  show  intermediate  results,  obtained  by  varying  one 
of  the  above  parameters  while  maintaining  constant  the 
others.  When  the  fixed  parameter  was  A  or  P  its  value  was  set 
to  12,  and  to  64  in  case  of  the  frame  length  L.  Figures  4.9 
to  4.11,  present  the  effect  of  the  number  of  amplitude 
quantization  bits.  Figures  4.12  to  4.14,  show  the  effect  of 
the  number  of  phase  bits.  Figures  4.15  to  4.20  show  the 
effect  of  the  length  of  the  frame.  Appendix  A  and  B  expand 
these  results,  by  showing  waveforms  and  narrow-band 
spectograms  of  reconstructed  speech  files,  for  different 
numbers  of  amplitude  bits  (8,  6,  4,  2),  different  numbers  of 
phase  bits  (6,  4,  2),  and  different  frame  lengths  (32,  24, 
22,  20,  18,  16,  12). 
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0.0000  SND  Original  Waveform  2.0000 


0.0000  4NFP18L6A2P  Original  Waveform  2.0000 

0.7776 


0.7577  SND  Original  Waveform  0.8314 


0.7577  4NFP1 8L6A2P  Original  Waveform  0.8312 


Figure  4.7.  Time  Waveforms.  Comparison 
between  original  speech  (SND)  and 
reconstructed  waveform  with  L=18,  A=6,  and  P=2 
(4NFP18L6A2P) .  This  parameters  yield  a  data 
rate  of  18  Kbps. 
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0.7862  SND  Original  Waveform  0.8362 


0.7862  4NFP1 8L6A2P  Original  Waveform  0.8362 


0.8075 


0.0000  4NFP18L6A2P  Narrow-Band  Spectrogram  1.5352 


Figure  4.8.  Time  Waveforms  and  Narrowband 
Spectrograms.  Same  as  last  figure  with  another 
detail  of  the  time  waveform,  and  narrow-band 
spectrograms . 
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Figure  4.5.  Time  Waveforms.  Comparison 
between  input  speech  (SND)  and  reconstructed 
waveforms  with  A=8,  6,  and  4. 
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0.7862  SND  Original  Waveform  0.8362 


0.7862  4NFP8A  Original  Waveform  0.8362 


0.7862  4NFP6A  Original  Waveform  0.8362 


0.7862  4NFP4A  Original  Waveform  0.8362 

Figure  4.10.  Detailed  Time  Waveforms.  Same  as 
last  figure  with  increased  detail. 
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\ 


0.0000  4NFP4A  Narrow-Band  Spectrogram  1.5405 


Figure  4.11.  Narrow-band  Spectrograms.  These 
spectrograms  correspond  to  the  waveforms  of 
Figure  4.9. 
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4NFP2P  Original  Waveform  2.0000 


Figure  4.12.  Time  Waveforms.  Comparison 
between  input  speech  (SND)  and  reconstructed 
waveforms  using  P=6,  4,  and  2. 
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0.7862  SND  ~  Original  Waveform  0.8362 


0.7862  4NFP6P  Original  Waveform  0.8362 


0.7862  4NFP4P  Original  Waveform  0.8362 


0.7862  4NFP2P Original  Waveform  0.8362 


Figure  4.13.  Detailed  Time  Waveforms.  Same  as 
last  figure  with  increased  detail. 
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0.0000  4NFP4P  Narrow-Band  Spectrogram  1.5352 


Figure  4.14.  Narrow-band  Spectrograms.  These 
spectrograms  correspond  to  the  waveforms  of 
Figure  4.12. 
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0.0000  SND  Original  Waveform  2.0000 


0.0000  4NFP32L  Original  Waveform  2.0000 


0.0000  4NFP28L  Original  Waveform  2.0000 
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0.0000  SND  Original  Waveform  2.0000 


0.0000  4NFP20L  Original  Waveform  2.0000 


0.0000  4NFP18L  Original  Waveform  2.0000 


0.0000  4NFP16L Original  Waveform  2.0000 

Figure  4.16.  Time  Wveforms.  Same  as  last 
figure  for  L=20,  18,  and  16. 
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0.7862  SND  Original  Waveform  0.8362 


0.7862  4NFP32L  Original  Waveform  0.8362 


0.7862  4NFP28L  Original  Waveform  0.8362 


0.7862  4NFP24L  Original  Waveform  0.8362 

Figure  4.17.  Detailed  Timewave forms.  Detail 
of  Figure  4.15. 


4-20 


0.7862  SND  Original  Waveform  0.8362 


0.7862  4NFP20L  Original  Waveform  0.8362 


0.7862  4NFP18L  Original  Waveform  0.8362 


0.7862  4NFP16L  Original  Waveform  0.8362 

Figure  4.18.  Detailed  Time  Waveforms.  Detail 
of  Figure  4.16. 
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0.0000 


4NFP32L  Narrow-Band  Spectrogram 


1.5352 


0.0000 


4NFP28L  Narrow-Band  Spectrogram 


1.5352 


Figure  4.19.  Narrow-band  spectrograms.  These 
spectrograms  correspond  to  the  waveforms  of 
Figure  4.15. 
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0.0000 


Narrow-Band  Spectrogram 


1.5352 


4NFP20L  Narrow-Band  Spectrogram 


1.5352 


4NFP18L  Narrow-Band  Spectrogram 


1.5352 


4NFP16L  Narrow-Band  Spectrogram 


1.5352 


Figure  4.20.  Narrow-band  Spectrograms.  These 
spectrograms  correspond  to  the  waveforms  of 
Figure  4.16. 
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V.  Conclusions  and  Recommendations 


The  purpose  of  this  chapter  is  to  draw  conclusions 
about  the  performance  of  this  system  and  give  recommendations 
for  further  research  in  this  area. 

Conclusions 

This  thesis  showed  that  it  is  possible  to  use  a 
sinusoidal  speech  model,  with  ruled  based  spectral  selection, 
to  obtain  relative  low  data  rates  (18  Kbps) .  The  minimization 
of  the  data  rate  is  accomplished  by  minimizing  the  three 
parameters:  number  of  spectral  components  that  characterize 
speech  (i.e.  frame  length),  number  of  amplitude  quantization 
bits,  and  number  of  phase  quantization  bits.  The  parameter 
with  most  effect  on  the  data  rate  is  the  frame  length  (see 
expression  3.1  )  .  The  use  of  frame  lengths  lower  than 
eighteen,  generated  a  "birdie"  noise  on  the  reconstructed 
speech.  On  the  other  hand,  the  output  of  the  system  with 
respect  to  the  other  two  parameters  didn't  show  audible 
degradation  for  amplitude  bits  down  to  six,  and  phase  bits 
down  to  two. 

A  side  result  from  this  encoding/decoding  process  is 
the  elimination  of  non-harmonic  noise  from  speech,  because 
speech  is  synthesized  from  a  selection  of  harmonics  of  the 
fundamental  frequency  (5) . 
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Recommendations 


The  "birdie"  noise  in  the  output  for  small  frame 
lengths,  suggests  the  first  recommendation:  a  process  for 
expanding  the  spectral  content  of  the  received  frames,  should 
be  investigated.  Further  work  can  also  be  done  in  the  area 
of  the  statistics  of  the  speech  spectral  amplitudes  and 
phases,  to  improve  their  respective  quantizers. 
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Appendix  A:  Amplitude  Quantization  Bits  -  Sample  Values 


This  Appendix  shows  the  effects  of  the  number  of 
amplitude  quantization  bits  for  several  frame  lengths.  To 
suppress  the  effects  of  the  number  of  phase  quantization 
bits,  this  number  was  set  to  a  high  value  (12  bits)  .  The  code 
that  appears  bellow  the  pictures  has  the  following  meaning: 


Example:  4NFP18L8A 

4N  -  4  Neighbors 
FP  -  Fixed  Pitch 
18L  -  Frame  Length  is  18 

8A  -  Number  of  Amplitude  Quantization  Bits  is  8 


A-l 


0.0000  4NFP32L6A  Narrow-Band  Spectrogram  1.5352 


0.7577  4NFP32L8A  Original  Waveform  0.7977 


0.7577  4NFP32L6A  Original  Waveform  0.7977 


0.7577  4NFP32L4A  Original  Waveform  0.7977 


0.7571 

r  4NFP32L2A 

Original  Waveform 

0.7977 

A-3 

0.0000 


4NFP20L4A  Narrow-Band  Spectrogram 


1.5352 


A-6 


0.7577  4NFP20L8A  Original  Waveform  0.7977 


0.7577  4NFP20L6A  Original  Waveform  0.7977 


0.7577  4NFP20L4A  Original  Waveform  0.7977 


A- 7 


0.7577  4NFP18L8A  Original  Waveform  0.7977 


A- 9 


A-10 


A-12 


0.7577  4NFP14L8A  Original  Waveform  0.7977 


0.7577  4NFP14L6A  Original  Waveform  0.7977 


0.7577  4NFP14L2A  Original  Waveform  0.7977 
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Appendix  B:  Phase  Quantization  Bits  -  Sample  Values 


This  Appendix  shows  the  effects  of  the  number  of 
phase  quantization  bits  for  several  different  frame  lengths. 
In  order  to  reduce  the  effects  of  the  number  of  amplitude 
quantization  bits,  this  number  was  set  to  a  high  value  (12) . 
The  code  that  appears  bellow  the  pictures  has  the  following 
meaning: 


Example:  4NFP22L6P 

4N  -  4  Neighbors 
FP  -  Fixed  Pitch 
22L  -  Frame  Length  is  22 

6P  -  Number  of  Phase  Quantization  Bits  is  6 
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0.7577 


4NFP32L1P 


Original  Wavaform 


0.7977 
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0.7577  4NFP24L2P Original  Waveform  0.7977 


0.7577  4NFP24L1P  Original  Wavaform  0.7977 
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0.7577  4NFP20L6P  Original  Waveform  0.7977 


0.7577  4NFP20L4P  Original  Waveform  0.7977 


0.7577  4NFP20L1P  Original  Waveform  0.7977 
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4NFP18L4P  Narrow-Band  Spectrogram  1.5352 


0.0000  4NFP18L2P  Narrow-Band  Spectrogram  1.5352 
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NFP16L6P 


Narrow-Band  Spectrogram 


2 


10 


100 


it  ft 

4NFP16L4P 


i 

l 


)0  4NFP16L2P 


Narrow-Band  Spectrogram  1 .5352 


0.7577 


4NFP16L6P 


Original  Waveform 


0.7977 


0.7577  4NFP16L4P  Original  Waveform  0.7977 
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0.7577  4NFP14L6P  Original  Waveform  0.7977 


0.7577  4NFP14L4P  Original  Waveform  0.7977 


0.7577  4NFP14L1P  Original  Waveform  0.7977 
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Appendix  C:  Data  Flow  Diagrams 
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02 


03 


Appendix  D:  Source  Code 
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*********************************************************** 


*  * 

*  * 

*  Title  :  Main  Program  -  Speech  Analysis/Synthesis  * 

*  Author  :  Luis  Alenquer  * 

*  Date  :  November  1989  * 

*  Language  :  Fortran  77  * 

*  Computer  :  VAX  11/780  * 

*  * 

*  * 


*  This  program  encodes  a  speech  file  into  three  frames  * 

*  containing  frequency,  amplitude,  and  phase  informa-  * 

*  tion.  The  original  speech  is  reconstructed  from  * 

*  values.  The  first  part  is  accomplished  in  the  sub-  * 

*  -program  "Speech  Analyser" ,  and  the  reconstruction  * 

*  is  done  in  the  "Speech  Sythesizer" .  * 

*  * 

*  * 

*********************************************************** 

integer*4  hdata(64) 
character*32  in,  out 
integer*2  timeFR(256) ,  ntimeFR(256) 
integer  n,  i,  qapb,  qphb,  sizel 
parameter  (  sizel=18) 

real  rfrFR(sizel) ,  qapFR ( sizel ) ,  qapFRp (sizel) , 

+  qapFRi (sizel) 

real  qphFR (sizel) ,  qphFRp (sizel) ,  qphFRi (sizel) 
real  rfrFRp (sizel) ,  rfrFRi (sizel) ,  vmax 
logical  begin, end 


begin=. true. 
end=. false. 

write  (*,4)  1  Enter  the  name  of  input  file:  ' 

read  (*,6)  in 

write  ( * , 4 )  '  Enter  the  name  of  output  file:  ' 

read  (*,6)  out 

write(*,4)  '  Enter  #  of  ampl.  quantization  bits:' 

read  (*,*)  qapb 

write(*,4)  '  Enter  #  of  phase  quantization  bits:' 

read  (*,*)  qphb 

write(*,4)  '  Enter  quantizer  max  amplitude:  ' 

read  (*,*)  vmax 

4  format  (a,$) 

6  format  (a) 
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call  datarate(sizel,  qapb,  qphb) 

open  (10,  file=in,  status® 'old' ,  access®' sequent i al ' , 
+  form® ' unformatted ' ,  recordtype® ' fixed ' , 

+  recl=128) 

open  (  unit=99,  file=out,  status® ' new ' , 

+  access® ' sequential' ,  form® 'unformatted' , 

+  recordtype®' fixed' ,  recl=128) 

read (10)  hdata 
15  write  (99)  hdata 

20  call  analyzer (begin,  timeFR,  sizel,  qapb,  qphb,  vmax, 
+  qapFRp,  qapFRi,  qphFRp,  qphFRi,  rfrFRp, 

rfrFRi,  end) 
if  (.not. end)  then 

if  (.not. begin)  then 
do  30  i=l, sizel 

qapFR ( i ) =qapFRp ( i ) 
qphFR ( i ) ®qphFRp ( i ) 

30  rfrFR (ij=rfrFRp(i) 

call  synthesizer (begin,  end,  sizel,  qapFR, 

+  qphFR,  rfrFR,  ntimeFR) 

endif 

do  40  i=l, sizel 

qapFR ( i ) ®qapFkx ( i ) 
qphFR ( i ) =qphFRi ( i ) 

40  rfrFR (i)®rfrFRi(i) 

endif 

call  synthesizer (begin,  end,  sizel,  qapFR,  qphFR, 

+  rfrFR,  ntimeFR) 

begin®. false, 
if  (.not. end)  goto  20 

end 


*********************************************************** 
*  * 

*  Subroutine  Speech  Analyzer  * 

*  * 

*  Top-level  module  of  the  speech  encoding  process.  * 

*  The  original  speech  file  is  analysed  in  the  fre-  * 

*  quency  domain  to  produce  three  frames  containing  * 

*  frequency,  amplitude  and  phase  information.  * 

*  * 
*********************************************************** 


subroutine  analyzer (begin,  timeFR,  sizel,  qapb,  qphb, 
+  vmax,  qapFRp,  qapFRi,  qphFRp, 

qphFRi,  rfrFRp,  rfrFRi,  end) 
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integer  size,  sizel 
parameter  (size=18) 
integer*2  timeFR(256) 
integer  i,  qapb,  qphb 
real  timFRp(256),  timFRi(256) 
real  dsapFR(256) ,  dphFR(256) 
real  rapFR(size) ,  rphFR(size) 
real  rfrFR(size) ,  qapFR(size) 
real  qapFRp (sizel) ,  qapFRi (sizel) 
real  qphFR(size) 

real  qphFRp (sizel) ,  qphFRi (sizel) 
real  rfrFRp(sizel) ,  rfrFRi (sizel) ,vmax 
logical  flagl,  flag2,  begin,  end 

end=. false, 
if  (begin)  then 

open  (50,  access= 1  sequential ' ,  form=' unformatted ' , 
+  file=,50tmp' ,status='new' ) 

open  (60,  access= 1  sequential ' ,  form= 'unformatted ' , 
+  file=' 60tmp' ,status='new' ) 

end  if 

read(10,end=200)  timeFR 

call  hamming (begin,  timeFR,  timFRp,  timFRi) 
if  (.not. begin)  then 

call  freqconv( flagl,  flag2,  timFRp,  dsapFR,  dphFR) 
call  dataextraction(size,  dsapFR,  dphFR,  rapFR, 

+  rphFR,  rfrFR) 

do  5  i=l,size 

5  rfrFRp(i)=rfrFR(i) 

call  outputformatting(size,  qapb,  qphb,  vmax, 

+  rapFR,  rphFR,  qapFR,  qphFR) 

do  10  i=l,size 

qapFRp ( i ) =qapFR ( i ) 

10  qphFRp (i)=qphFR(i) 

endif 

call  freqconv( flagl,  flag2,  timFRi,  dsapFR,  dphFR) 
call  dataextraction(size,  dsapFR,  dphFR,  rapFR,  rphFR, 
+  rfrFR) 

call  outputformatting (size, qapb,  qphb,  vmax,  rapFR, 
+  rphFR,  qapFR,  qphFR) 

do  20  i=l,size 

qapFRi ( i ) =qapFR ( i ) 
qphFRi ( i ) =qphFR ( i ) 

20  rfrFRi (i)=rfrFR(i) 

return 

200  end= . true . 
end 
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********************************************************* 

* 

Subroutine  Hamming  Windows  * 

* 

Hamming  Windows  with  50%  overlap  are  applied  to  * 

the  incoming  speech  frames.  * 

* 

********************************************************* 

subroutine  hamming (begin,  timeFR,  timFRp,  timFRi) 

real  timFRi (256) , timFRp (256) 
integer*2  timeFR(256) 
integer  j , k 

dimension  x(256),  z (256) ,w(256) 
real  x,  z,  pi,  w 
logical  begin 
save  x 

data  pi/3.14159265358979324/ 

if  (.not. begin)  goto  120 
110  do  115  j=l ,256 

x(i ) =timeFR( j ) 

w(j)=X(j) *(0.54-0. 46*COS((2*pi/255) *(j-l) ) ) 

115  timFRi (j)=w(j) 

goto  199 

120  do  125  j=l,128 

Z(j)=x(j+128) 

125  z ( j+128) =timeFR( j ) 

do  130  j=l , 256 

130  timFRp ( j ) =z (j)*(0.54-0. 46*COS ( (2*pi/255) * ( j -1) ) ) 

goto  110 
199  continue 

end 


** 

* 

* 

* 

* 

* 

* 

*  * 


********************************************************** 


*  * 

*  Subroutine  Frequency  Conversion  * 

*  * 

*  An  input  time  frame  is  converted  to  the  * 

*  frequency  domain,  resulting  two  frames  with  * 

*  spectral  amplitudes  and  phases.  * 

*  * 


********************************************************** 


subroutine  freqconv(flagl,  flag2,  tiFR,  dsapFR,  dphFR) 

real  apFR(256) ,  tiFR(256) ,  phFR(256) ,  sapFR(256), 

+  apFRl (256) ,  apFR2 (256) ,  phase (256),  dsapFR (2 56 ) , 
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+  dphFR(256),  apFR3 (256) 

logical  flagl,  flag2 
integer  i 

*  save  apFRl ,  apFR2 ,  apFR3,  phase 

save 

call  dft(tiFR,  apFR,  phFR) 
do  10  i=l , 256 

dsapFR ( i ) =apFR ( i ) 

10  dphFR ( i ) =phFR ( i ) 


end 


********************************************************** 
*  * 

*  Subroutine  Discrete  Fourier  Transform  * 

*  * 

*  This  modules  calls  a  Fast  Fourier  Transform  * 

*  subroutine  to  take  a  512-point  FFT  of  an  * 

*  incoming  time  frame.  The  output  of  the  FFT  is  * 

*  converted  to  amplitudes  and  phases.  * 

*  * 
********************************************************** 


subroutine  dft(tiFR,  apFR,  phFR) 

real  yy (512) ,  XX(512) ,  apFR(256) ,  phFR(256) ,  tiFR(256) 
real  a,  b 
integer  i 


do 

10 

i=l , 512 

10 

yy 

(i)=0 

do 

20 

i=l ,256 

XX 

(i)=tiFR( 

20 

XX 

( i+256) =0 

call 

f ft (9,  XX 

do 

30 

i=l ,256 

*=xx(i) 
b=yy ( i ) 

apFR(i)=(sqrt (a**2+b**2) *1.7) 
if  (a.eq.O)  then 
phFR(i) =pi/2 
else 

phFR(i)=atan2 (b,a) 
end  if 

30  continue 

end 
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in  vo 


********************************************************** 
*  * 

*  Subroutine  Fast  Fourier  Transform  (10:457)  * 

*  * 
********************************************************** 


subroutine  fft(log2n,  xr,  xi,  ntype) 
dimension  xr(l),  xi(l) 

integer  log2n,  ntype,  i,  j,  k,  n,  nv2,  nml,  1,  le, 
+  lei,  ip 

real  xi,  xr,  tr,  ti,  ur,  ui,  wr,  wi,  ain,  sign,  pi 
data  pi/3.14159265358979324/ 

sign=-l . 

if  (ntype. It. 0)  sign=l. 
n=2**log2n 
nv2=n/2 
nml=n-l 
j=l 

do  7  i=l,nml 

if  (i.ge.j)  goto  5 
tr=xr ( j ) 
ti=xi ( j ) 
xr(j)=xr(i) 
xi(j)=xi(i) 
xr (i)=tr 
xi(i)=ti 
k=nv2 

if  (k.ge.j)  goto  7 
j-j-k 
k=k/2 
goto  6 
7  j=j+k 

do  20  l=l,log2n 
le=2**l 
lel=le/2 
ur=l . 
ui=0. 

wr=cos (pi/lel) 
wi=»sign*sin(pi/lel) 
do  20  j=l,lel 

do  10  i=j,n,le 
ip=i+lel 

tr«xr ( ip) *ur-xi ( ip) *ui 
t i=xr ( ip) *ui+xi ( ip) *ur 
xr (ip)=xr (i) -tr 
xi(ip)=xi(i)-ti 
xr (i)=xr(i)+tr 
10  xi(i)=xi(i)+ti 

tr=ur*wr-ui*wi 
ti®ur*wi+ui*wr 
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20 


30 


ur=tr 

ui=ti 

i f  ( ntype . gt . 0 )  return 

ain=l./n 

do  30  i=l , n 

xr (i)=xr(i) *ain 
xi (i)=xi(i) *ain 
return 

end 


********************************************************** 
*  * 

*  Subroutine  Data  Extraction  * 

*  * 

*  This  module  controls  the  encoding  process.  * 

*  * 
********************************************************** 

subroutine  dataextraction( size,  dsapFR,  dphFR,  rapFR, 
+  rphFR,  rfrFR) 

integer  size 

real  dsapFR(256) ,  dphFR (2 56 ) ,  rapFR (size) , 

+  rphFR(size),  rfrFR(size) ,  ssapFR(256) 

call  harmpeakselect (size,  dsapFR,  ssapFR) 
call  freqFR(size,  ssapFR,  rfrFR) 
call  ampFR(size,  ssapFR,  rapFR) 
call  phaseFR(size,  rfrFR,  dphFR,  rphFR) 

end 


********************************************************** 
*  * 

*  Subroutine  Harmonics/Peaks  Selection  * 

*  * 

*  Based  on  the  energy  of  a  frame  this  module  chooses  * 

*  an  harmonic  or  a  peak  selection  process.  * 

*  * 
********************************************************** 

subroutine  harmpeakselect (size,  dsapFR,  ssapFR) 

integer  size,  i 

real  dsapFR(256) ,  ssapFR(size) ,  energy,  vapFR(256) , 
+  uapFR(256) 

external  energy 


if  (energy (dsapFR) .gt.9.e+5)  then 
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10 


20 


do  10  i=l,256 

vapFR ( i ) =dsapFR ( i ) 
call  harmselect(size, 
else 

do  20  i=l, 256 

uapFR ( i ) =dsapFR ( i ) 
call  peakselect (size, 
endif 


vapFR, 


uapFR, 


ssapFR) 


ssapFR) 


end 


********************************************************** 


*  * 

*  Function  Energy  of  a  Frame  * 

*  * 

*  The  energy  of  frame  is  computed  from  its  spectral  * 

*  amplitudes.  * 

*  * 


********************************************************** 

real  function  energy (dsapFR) 

real  dsapFR(256) 
integer  i 

energy=0 . 
do  10  i=l ,256 

10  energy=energy+dsapFR(i) **2 

energy=sgrt ( energy ) 
write(*,*)  '  ' 
write ( * , * )  ' energy : ' 
write(*,*)  energy 
write(*,*)  '  ' 
end 


********************************************************** 


*  * 

*  Subroutine  Harmonic  Selection  * 

*  * 

*  This  is  a  control  module  that  encapsulates  the  * 

*  harmonics  selection  process.  * 

*  * 


********************************************************** 
subroutine  harmselect (size,  vapFR,  ssapFR) 
integer  size 

real  vapFR (2 56 ) ,  ssapFR(256) ,  hFR(256) 


D-9 


call  harmdetect (vapFR,  hFR) 
call  threshold (size,  hFR,  ssapFR) 

end 

********************************************************** 


*  * 

*  Subroutine  Harmonic  Detection  * 

*  * 

*  Detects  harmonics  based  on  a  fixed  pitch  and  a  * 

*  four  neighbors  rule.  * 

*  * 


********************************************************** 
subroutine  harmdetect (vapFR,  hFR) 
integer  i,  j,  n 

real  vapFR(256) ,  hFR(256) ,  harm,  a,  b,  c,  d,  e 

do  10  i=l ,256 
10  hFR(i) =0 . 0 

n=8 
j=n+2 

do  20  i=n+2,255 
harm=0.0 

if  (i.eq.j)  then 
a=vapFR(i-l) 
b=vapFR(i) 
c=vapFR(i+l) 
d=vapFR(i-2) 
e=vapFR(i+2) 
harm=max ( a , b , c , d , e ) 
hFR ( i ) =b 
hFR(i-l)=a 
hFR(i+l)=c 
hFR(i-2) =d 
hFR(i+2)=e 

if  (j.le.12)  then  n=j 

a=0. 0 

b=0 . 0 

c=0. 0 

d=0 . 0 

e=0. 0 

j=j+n 

end  if 

20  continue 
end 
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********************************************************** 

*  * 

*  Subroutine  Variable  Amplitude  Threshold  * 

*  * 

*  Sets  up  a  variable  amplitude  threshold  to  select  * 

*  a  desired  number  of  amplitudes  components.  * 

*  * 
********************************************************** 

subroutine  threshold (size,  hFR,  ssapFR) 

integer  size,  n 

real  hFR(256)  ,  ssapFR(256),  a,  b,  x(256),  d,  c,  e,  newa 
logical  lower,  greater,  incr,  deer,  flag 

d=hFR ( 8 ) 
c=hFR ( 9 ) 
e=hFR(10) 
a=max(d,c,e)/4 
if  (a.eq.O.O)  then 
a=100 . 0 
end  if 

lower=. false. 
greater= . false . 
incr=. false. 
decr=. false, 
flag*, false. 

5  n=2  56 

do  10  i=l , 256 
10  x(i)=hFR(i) 

do  20  i=l , 256 

b=a/sqrt (1. 0+ (i/42 . 0) **2 . 0) 
if  (x(i).le.b)  then 
x(i)=0.0 
n=n-l 
endif 

20  continue 

if  (n. It. size)  then 

if  (a.eq.O.O)  goto  25 
lower=. true, 
greater* .false . 
endif 

if  (n.gt.size)  then 
greater* . true . 
lower*. false, 
endif 

if ( ( (.not. deer) .and. (greater. or. incr) ) .and. (n.ne.size) ) 
+  then 

call  increase (flag,  greater,  a,  newa) 
incr*. true. 
a*newa 
goto  5 
endif 
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if ( (deer. or. (lower. and. (.not.incr) ) ) .and. (n.ne.size) ) 
+  then 

call  decrease (flag, lower, a, newa) 
decr= . true . 
a=newa 
goto  5 
endif 

flag-. false. 

25  do  30  i=l, 256 
30  ssapFR(i)=x(i) 


end 

********************************************************** 
*  * 

*  Subroutine  Peak  Selection  * 

*  * 

*  This  is  a  control  module  that  encapsulates  the  * 

*  peaks  selection  process.  * 

*  * 
********************************************************** 

subroutine  peakselect (size,  uapFR,  ssapFR) 

integer  size 

real  uapFR(256) ,  SSapFR(256) ,  pFR(256) 

call  peakdetect (uapFR,  pFR) 
call  threshold (size,  pFR,  ssapFR) 

end 


********************************************************** 
*  * 

*  Subroutine  Peak  Detection  * 

*  * 

*  No  peak  detection  is  done.  Nevertheless,  the  module  * 

*  is  maintained  so  it  can  be  used  in  future  research.  * 

*  * 
********************************************************** 

subroutine  peakdetect (uapFR,  pFR) 

integer  i 

real  uapFR (2 56 ) ,  pFR(256),  a,  b,  c 

do  30  i**l, 256 
30  pFR ( i ) =uapFR ( i ) 


end 
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********************************************************** 


*  * 

*  Subroutine  Frequency  Frame  Generation  * 

*  * 

*  The  frequency  of  the  selected  amplitude  components  * 

*  is  written  into  an  array.  * 

*  * 


********************************************************** 

subroutine  freqFR(size,  ssapFR,  rfrFR) 

integer  size,  i,  j 

real  ssapFR(256),  rfrFR(size) 

do  5  i=l,size 
5  rfrFR(i)=0.0 

j=0 

do  10  i=l ,256 

if  (ssapFR(i) .ne.0.0)  then 

j-j+1 

rfrFR(j)=i 

endif 

10  continue 

end 


********************************************************** 


*  * 

*  Subroutine  Amplitude  Frame  Generation  * 

*  * 

*  The  selected  amplitude  components  are  written  * 

*  into  a  reduced  length  frame  * 

*  * 


********************************************************** 

subroutine  ampFR(size,  ssapFR,  rapFR) 

integer  size,  i,  j 

real  ssapFR (2 56 ) ,  rapFR(size) 

do  5  i=l,size 
5  rapFR ( i ) =0 . 0 

3=0 

do  10  i=l , 256 

if  (ssapFR(i) .ne. 0. 0)  then 

j-j+1 

rapFR ( j ) =ssapFR ( i ) 
endif 

10  continue 

end 
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********************************************************** 


*  * 

*  Subroutine  Phase  Frame  Generation  * 

*  * 

*  According  to  the  frequency  of  the  selected  * 

*  components,  the  original  phase  frame  is  * 

*  rewritten  into  a  smaller  size  frame.  * 

*  * 


********************************************************** 
subroutine  phaseFR(size,  rfrFR,  dphFR,  rphFR) 
integer  size,  i,  j 

real  dphFR ( 256) ,  rfrFR(size) ,  rphFR(size) 

do  5  i=l,size 
5  rphFR(i)=0.0 

do  10  i=l,size 
j=rfrFR(i) 
if  (j.ne.0.0)  then 
rphFR ( i ) =dphFR ( j ) 
endif 
10  continue 


end 


********************************************************** 


*  * 

*  Subroutine  Increase  * 

*  * 

*  Works  in  conjuction  with  the  Variable  Amplitude  * 

*  Threshold,  by  furnishing  to  this  module  new  * 

*  values  for  the  threshold.  * 

*  * 


********************************************************** 

subroutine  increase (flag,  greater,  a,  newa) 

real  a,  newa,  step,  lasta 
logical  greater,  flag 
save 

if  (greater)  then 

if  (.not. flag)  step-100.0 
newa-a+step 
lasta-a 

endif 

if  ( .not. greater)  then 
step*step/10. 0 
newa=lasta+step 
flag*. true. 


endif 


end 


********************************************************** 


*  * 

*  Subroutine  Decrease  * 

*  * 

*  Works  in  conjuction  with  the  Variable  Amplitude  * 

*  Threshold  module.  It  generates  new  values  for  * 

*  the  threshold.  * 

*  * 


********************************************************** 

subroutine  decrease (flag,  lower,  a,  newa) 

real  a,  newa,  step,  lasta 
logical  lower,  flag 
save 

if  (lower)  then 

if  (.not. flag)  step=100.0 
newa=a-step 

if  (newa. le. 0.0)  newa=0.0 
lasta-a 
endif 

if  (.not. lower)  then 
step=step/10 . 0 
newa=lasta-step 
if  (newa. le. 0. 0)  newa=0.0 
flag=. true, 
endif 

end 


********************************************************** 

*  * 

*  Subroutine  Output  Formatting  * 

*  * 

*  This  module  encapsulates  the  quantizing  process.  * 

*  * 
********************************************************** 

subroutine  output formatting (size,  qapb,  qphb,  vmax, 

+  rapFR,  rphFR,  qapFR,  qphFR) 


integer  size,  qapb,  qphb 

real  rapFR ( size) ,  rphFR (size) ,  qapFR (size) , 
+  qphFR(size) ,  vmax 
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call  ampquantizer(size,  qapb,  vmax,  rapFR,  qapFR) 
call  phasequantizer (size,  qphb,  rphFR,  qphFR) 

end 


********************************************************** 


*  * 

*  Subroutine  Amplitude  Quantizer  * 

*  * 

*  This  module  implements  a  linear  quantizer  with  * 

*  midread  at  the  origin.  The  quantile  interval  is  * 

*  determined  by  the  number  of  bits  (qapb)  and  by  * 

*  the  dynamic  range  (vmax) .  * 

*  * 


********************************************************** 


subroutine  ampquantizer (size,  qapb,  vmax,  rapFR,  qapFR) 


integer  i,  size,  qapb 

real  rapFR (size) ,  qapFR(size) ,  q,  level,  vmax,  a, 
+  c ,  qapl 

qapl=real (2**qapb) 
q=vmax/qapl 
do  20  i=l,size 

c=a int ( rapFR ( i ) /q) 
level=c*q-q 

if  (rapFR (i) .ge.vmax)  then 
qapFR (i)=vmax 
goto  20 
end  if 

10  level=level+q 

a=level+g/2 

if  ( (rapFR(i) .gt.a) .and. 

+  (rapFR(i) . It. vmax) )  goto  10 

qapFR (i)«level 
20  continue 

end 
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********************************************************** 


*  * 

*  Subroutine  Phase  Quantizer  * 

*  * 

*  This  module  implements  a  linear  quantizer  with  * 

*  midread  at  the  origin.  The  quantile  interval  is  * 

*  determined  by  the  number  of  bits  (qphb) .  * 

*  * 

*  * 


********************************************************** 
subroutine  phasequantizer (size,  qphb,  rphFR,  qphFR) 
integer  i,  size,  qphb 

real  rphFR  ( size) ,  qphFR  (size)  ,  q,  level,  pi,  a,  c,  qphl 
logical  tst 

data  pi/3.14159265358979324/ 

qphl=real (2**qphb) 
q=2*pi/qphl 
do  20  i=l,size 

c=aint ( abs ( rphFR ( i ) ) /q) 
level=c*q-q 

10  level=level+q 

a=level+q/2 

if  (abs(rphFR(i) ) .gt.a)  goto  10 
if  (rphFR(i) .It. 0.0)  then 
qphFR ( i ) =-level 
else 

qphFR ( i ) =level 
endif 

20  continue 


end 


********************************************************** 
*  * 

*  Subroutine  Data  Rate  Calculation  * 

*  * 

*  Computes  the  data  rate  based  on  the  length  of  * 

*  the  frame  (sizel) ,  and  on  the  number  of  * 

*  amplitude  (qapb)  and  phase  (pphb)  quantization  * 

*  bits.  * 

*  * 
********************************************************** 


subroutine  datarate(sizel,  qapb,  qphb) 

integer  sizel,  qapb,  qphb 
real  rate 

rate=(2*sizel*(qapb+qphb+8) )/(32) 
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write (*, *) 
write (*, *) 
write (*, *) 
write (*, *) 
write (*,*) 
write (*, *) 
write (*,*) 


sizel 

qapb 

qphb 


'Data  Rate  [kbit/s]:' 
rate 


end 


1 


********************************************************** 

★  * 

*  Subroutine  Synthesizer  * 

*  * 

*  This  module  encapsulates  and  controls  the  speech  * 

*  reconstruction  process.  * 

*  * 
********************************************************** 

subroutine  synthesizer  (begin,  end,  sizel,  gapFR,  qphFR, 
+  rfrFR,  ntimeFR) 


integer*2  ntimeFR(256) 
integer  sizel 

real  qapFR(sizel) ,  qphFR(sizel) ,  rfrFR(sizel) ,  + 

pFR (256) ,  phFR (256) , gtimeFR (256) , rtimeFR(256) 
logical  begin,  end,  valid 

if  (.not. end)  then 

call  frgeneration( sizel,  qapFR,  qphFR,  rfrFR,  apFR, 
+  phFR) 

call  tdconvertion(apFR,  phFR,  gtimeFR) 
endif 

call  datareduction( begin,  end,  gtimeFR,  rtimeFR,  valid) 
if  (valid)  then 

call  ampnorm( rtimeFR,  ntimeFR,  end) 
endif 

end 
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********************************************************** 
*  * 

*  Subroutine  256-point  Frame  Generation  * 

*  * 

*  Expands  the  reduced  length  frames  to  256-point  * 

*  frames  by  using  the  frequency  information.  * 

*  * 
********************************************************** 

subroutine  frgeneration( sizel,  qapFr,  qphFR,  rfrFR, 
+  apFR,  phFR) 

integer  sizel,  i,  j 

real  qapFR(sizel) ,  qphFR(sizel) ,  rfrFR(sizel) , 

+  apFR(256) ,  phFR(256) 

do  10  i=l , 256 
apFR(i)=0. 0 
10  phFR(i)=0.0 

do  20  i=l, sizel 
j=rfrFR(i) 
apFR ( j ) =qapFR ( i ) 

20  phFR ( j ) =qphFR ( i ) 


end 


********************************************************** 
*  * 

*  Subroutine  Time  Domain  Conversion  * 

★  * 

*  Uses  a  speech  sinusoidal  model  to  produce  a  time  * 

*  waveform  from  the  amplitudes  and  phase  values.  * 

*  * 
********************************************************** 

subroutine  tdconvert ion (apFR,  phFR,  gtimeFR) 

integer  i ,  j 

real  apFR(256) ,  phFR(256) ,  gtimeFR(256) ,  pi 
data  pi/3.14159265358979324/ 

do  20  j=l, 256 

gtimeFR(j)=0.0 
do  10  i=l , 256 

if  (apFR(i) .gt.0.0)  then 

gtimeFR ( j ) *gt imeFR ( j ) + ( apFR ( i ) * 

+  cos( ( (2*pi*(i-l)*(j-l) )/512)+  phFR(i) ) ) 

endif 

10  continue 
20  continue 
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end 


********************************************************** 

*  * 

*  Subroutine  Data  Reduction  * 

*  * 

*  The  number  of  frames  that  was  doubled  by  the  * 

*  Hamming  Windows  module, in  the  Analyser,  is  reduced  * 

*  to  its  original  number.  * 

*  * 
********************************************************** 

subroutine  datareduction (begin,  end,  gtimeFR,  rtimeFR, 
+  valid) 


integer  n,  i,  k 

real  gtimeFR(256) ,  rtimeFR(256) ,  x(256) ,  y(256) , 
+  htimeFR (2 56) 

logical  begin, end, valid 
save 

if  (.not. end)  then 
if  (begin)  then 
n=l 

call  hamm (gtimeFR, htimeFR) 
valid®. false, 
do  10  i®l,256 

10  x(i)=htimeFR(i) 

else 
n=n+l 
k=mod(n, 2) 
if  (k.eg.O)  then 

call  hamm (gtimeFR, htimeFR) 
do  20  i®l,256 

20  y (i)=htimeFR(i) 

do  30  i=l , 128 

rtimeFR (i)=x(i) 

30  rt imeFR ( i+ 1 2  8 ) =x ( i+ 1 2  8 ) +y ( i ) 

valid=. true, 
else 

call  hamm (gtimeFR, htimeFR) 
do  40  i®l,256 

40  x(i)®htimeFR(i) 

do  50  i®l , 128 

50  x(i)®x(i)+y(i+128) 

valid®. false, 
endif 
endif 
else 

do  60  i®l, 256 

60  rtimeFR(i)®x(i) 

valid®. true. 

endif 
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end 


********************************************************** 


*  * 

*  Subroutine  Hamming  * 

*  * 

*  All  the  incoming  time  frames  are  Hamming  windowed  * 

*  before  a  data  reduction  is  performed.  * 

*  * 


********************************************************** 
subroutine  hamm(gtimeFR,  htimeFR) 
integer  i 

real  gtimeFR(256) ,  htimeFR(256) 
data  pi/3.14156265358979324/ 

do  10  i=l , 256 

10  timeFR(i)=gtimeFR(i) *(0.54-0.46* 

+  cos ( (2*pi/255) * ( j~l) ) ) 

end 


********************************************************** 


*  * 

*  Subroutine  Amplitude  Normalization  * 

*  * 

*  Normalizes  the  reconstructed  speech  output.  * 

*  This  module  was  adapted  from  Bashir  thesis  (9:B-7).  * 

*  * 


********************************************************** 
subroutine  ampnorm(rtimeFRf  ntimeFR,  end) 
integer  i 

integer*2  ntimeFR(256) ,  s(256) 

real  rtimeFR(256) ,  amax,  amaxl,  x(256),  z(256),  p 
logical  end 

if  (.not. end)  then 
write (50)  rtimeFR 
else 

write (50)  rtimeFR 
rewind (50) 
read(50,end=820)  z 
amax=0.0 
do  810  i-1,256 
x(i)*abs (z (i) ) 
amax=max ( amax , x ( i ) ) 
write (60)  amax 


800 

810 
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820 

goto  800 
amax=0 . 0 

840 

rewind (60) 

read(60,end=860)  amaxl 

860 

amax-max ( amax , amaxl ) 
goto  840 
rewind (50) 

880 

p=32760. 0/amax 
read(50/end=899)  z 

890 

do  890  i=l ,256 

s(i)=int(p*z(i) ) 

899 

write (99)  s 
goto  880 
rewind (99) 

end 

endif 
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